{
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": 3
  },
  "orig_nbformat": 2,
  "kernelspec": {
   "name": "python_defaultSpec_1596071695180",
   "display_name": "Python 3.7.1 64-bit ('base': conda)"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2,
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cluster Interface / Interface Similarity\n",
    "\n",
    "## Data Resource\n",
    "\n",
    "* (2007) [PISA](https://www.ebi.ac.uk/pdbe/pisa/)\n",
    "* (2011, 2020) [ProtCID](http://dunbrack2.fccc.edu/ProtCiD/Default.aspx)\n",
    "* (2012) [InterEvol](http://biodev.cea.fr/interevol/interevol.aspx)\n",
    "* (2014) [PIFACE](http://prism.ccbb.ku.edu.tr/piface/)\n",
    "* (2014) [EPPIC](http://www.eppic-web.org/ewui/#)\n",
    "* (2017) [~~QSbio~~](http://www.qsbio.org/)\n",
    "\n",
    "## Method\n",
    "\n",
    "* (2004) [MultiProt](https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.10628)\n",
    "* (2005) [TM-Align](https://doi.org/10.1093/nar/gki524)\n",
    "* (2009) [MM-align](https://doi.org/10.1093/nar/gkp318)\n",
    "* (2010) [iAlign](http://doi.org/10.1093/bioinformatics/btq404)\n",
    "* (2015) [PCalign](https://doi.org/10.1186/s12859-015-0471-x)\n",
    "* (2015) [PROSTA-inter](https://doi.org/10.1093/bioinformatics/btv242)\n",
    "* (2018) [InterComp](https://doi.org/10.1093/bioinformatics/bty587)\n",
    "* (2018) [PatchBag](https://doi.org/10.1038/s41598-018-26497-z)\n",
    "\n",
    "## Some Other Insights\n",
    "\n",
    "* (2018) [Integrating co‐evolutionary signals and other properties of residue pairs to distinguish biological interfaces from crystal contacts](https://onlinelibrary.wiley.com/doi/full/10.1002/pro.3448)\n",
    "* (2018) [Distinguishing crystallographic from biological interfaces in protein complexes: role of intermolecular contacts and energetics for classification](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2414-9)\n",
    "* (2019) [Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals](http://dx.doi.org/10.1038/s41598-019-48913-8)\n",
    "\n",
    "## Reference\n",
    "\n",
    "1. Xu, Q., Dunbrack, R.L. ProtCID: a data resource for structural information on protein interactions. Nat Commun 11, 711 (2020). https://doi.org/10.1038/s41467-020-14301-4\n",
    "2. Gao M, Skolnick J. iAlign: a method for the structural comparison of protein-protein interfaces. Bioinformatics (Oxford, England). 2010 Sep;26(18):2259-2265. DOI: 10.1093/bioinformatics/btq404.\n",
    "3. Cukuroglu E, Gursoy A, Nussinov R, Keskin O. Non-redundant unique interface structures as templates for modeling protein interactions. PLoS One. 2014;9(1):e86738. Published 2014 Jan 27. doi:10.1371/journal.pone.0086738\n",
    "4. Baskaran, K., Duarte, J.M., Biyani, N. et al. A PDB-wide, evolution-based assessment of protein-protein interfaces. BMC Struct Biol 14, 22 (2014). https://doi.org/10.1186/s12900-014-0022-0\n",
    "5. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372(3):774-797. doi:10.1016/j.jmb.2007.05.022\n",
    "6. Yang Zhang, Jeffrey Skolnick, TM-align: a protein structure alignment algorithm based on the TM-score, Nucleic Acids Research, Volume 33, Issue 7, 1 April 2005, Pages 2302–2309, https://doi.org/10.1093/nar/gki524\n",
    "7. Xuefeng Cui, Hammad Naveed, Xin Gao, Finding optimal interaction interface alignments between biological complexes, Bioinformatics, Volume 31, Issue 12, 15 June 2015, Pages i133–i141, https://doi.org/10.1093/bioinformatics/btv242\n",
    "8. Claudio Mirabello, Björn Wallner, Topology independent structural matching discovers novel templates for protein interfaces, Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages i787–i794, https://doi.org/10.1093/bioinformatics/bty587\n",
    "9. Budowski-Tal I, Kolodny R, Mandel-Gutfreund Y. A Novel Geometry-Based Approach to Infer Protein Interface Similarity. Sci Rep. 2018;8(1):8192. Published 2018 May 29. doi:10.1038/s41598-018-26497-z\n",
    "10. Faure G, Andreani J, Guerois R. InterEvol database: exploring the structure and evolution of protein complex interfaces. Nucleic Acids Res. 2012;40(Database issue):D847-D856. doi:10.1093/nar/gkr845\n",
    "11. Elez K, Bonvin AMJJ, Vangone A. Distinguishing crystallographic from biological interfaces in protein complexes: role of intermolecular contacts and energetics for classification. BMC Bioinformatics. 2018;19(Suppl 15):438. Published 2018 Nov 30. doi:10.1186/s12859-018-2414-9\n",
    "12. Fukasawa Y, Tomii K. Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals. Sci Rep. 2019;9(1):12603. Published 2019 Aug 30. doi:10.1038/s41598-019-48913-8\n",
    "13. Dey S, Ritchie DW, Levy ED. PDB-wide identification of biological assemblies from conserved quaternary structure geometry. Nat Methods. 2018;15(1):67-72. doi:10.1038/nmeth.4510\n",
    "14. Hu J, Liu HF, Sun J, Wang J, Liu R. Integrating co-evolutionary signals and other properties of residue pairs to distinguish biological interfaces from crystal contacts. Protein Sci. 2018;27(9):1723-1735. doi:10.1002/pro.3448\n",
    "15. Shatsky M, Nussinov R, Wolfson HJ (2004) A method for simultaneous alignment of multiple protein structures. Proteins 56: 143–156.\n",
    "16. Cheng, S., Zhang, Y. & Brooks, C.L. PCalign: a method to quantify physicochemical similarity of protein-protein interfaces. BMC Bioinformatics 16, 33 (2015). https://doi.org/10.1186/s12859-015-0471-x\n",
    "17. Mukherjee S, Zhang Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res. 2009;37(11):e83. doi:10.1093/nar/gkp318\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nglview\n",
    "import pandas as pd\n",
    "import ujson as json\n",
    "rep1 = [\n",
    "    {\"type\": \"line\", \"params\": {\n",
    "        \"sele\": surface, \"color\":\"chainindex\", \"opacity\": 0.2\n",
    "    }},\n",
    "    {\"type\": \"spacefill\", \"params\": {\n",
    "        \"sele\": interface, \"color\": \"chainindex\", \"opacity\": 0.3\n",
    "    }},\n",
    "    {\"type\": \"line\", \"params\": {\n",
    "        \"sele\": interface, \"color\": \"residueindex\"\n",
    "    }},\n",
    "    {\"type\": \"surface\", \"params\": {\n",
    "        \"sele\": interface, \"color\": \"chainindex\",\"opacity\": 0.1\n",
    "    }}\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "output_type": "display_data",
     "data": {
      "text/plain": "NGLWidget(background='#212121')",
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "207dbee99ecd455fa3d6aa98a598bdc9"
      }
     },
     "metadata": {}
    }
   ],
   "source": [
    "view = nglview.show_file(\"./pdb_files/1u7f.cif\")\n",
    "view.background = '#212121'\n",
    "view.representations = rep1\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![fig](../docs/figs/1u7f_A_B_interface.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "view.clear_representations()\n",
    "chain_res_tem = \"({res_str}) and :{chain_id}\"\n",
    "a_ab = chain_res_tem.format(res_str=ab_dict['A'], chain_id='A')\n",
    "b_ab = chain_res_tem.format(res_str=ab_dict['B'], chain_id='B')\n",
    "b_bc = chain_res_tem.format(res_str=bc_dict['B'], chain_id='B')\n",
    "c_bc = chain_res_tem.format(res_str=bc_dict['C'], chain_id='C')\n",
    "a_ac = chain_res_tem.format(res_str=ac_dict['A'], chain_id='A')\n",
    "c_ac = chain_res_tem.format(res_str=ac_dict['C'], chain_id='C')\n",
    "\n",
    "a_ab_s = chain_res_tem.format(res_str=ab_s_dict['A'], chain_id='A')\n",
    "b_ab_s = chain_res_tem.format(res_str=ab_s_dict['B'], chain_id='B')\n",
    "\n",
    "i_chains = ' or '.join(f\"({i})\" for i in (a_ab, b_ab)) # , b_bc, c_bc, a_ac, c_ac\n",
    "s_chains = ' or '.join(f\"({i})\" for i in (a_ab_s, b_ab_s))\n",
    "interface = f\"({i_chains}) and % and /0 and protein\"\n",
    "surface = f\"({s_chains}) and % and /0 and protein\"\n",
    "\n",
    "view.add_cartoon(selection=\"(:A or :B) and protein\", color=\"chainindex\", opacity=0.5) # (:A or :B) and protein\n",
    "view.add_spacefill(selection=\"(:A or :B) and protein\", color=\"gray\", opacity=0.1)\n",
    "view.add_spacefill(selection=surface, color=\"chainindex\", opacity=0.3)\n",
    "view.add_surface(selection=interface, color=\"chainindex\")\n",
    "# view.add_surface(selection=interface, color=\"residueindex\", opacity=0.05)\n",
    "\n",
    "\n",
    "# \n",
    "# \"352 or 355 and ^ and :B and % and /0\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "converters = {\n",
    "    'pdb_id': str,\n",
    "    'chain_id': str,\n",
    "    'struct_asym_id': str,\n",
    "    'entity_id': int,\n",
    "    'author_residue_number': int,\n",
    "    'residue_number': int,\n",
    "    'author_insertion_code': str}\n",
    "\n",
    "eec_as_df = pd.read_csv(\"C:\\\\Download\\\\20200716\\\\biounit\\\\0725.tsv\", sep=\"\\t\", converters=converters)\n",
    "check = pd.read_csv(\n",
    "    r\"C:\\Download\\20200716\\biounit\\pisa%interfacedetail%+1u7f%1%3.tsv\", \n",
    "    sep=\"\\t\", \n",
    "    usecols=['pdb_code', 'assemble_code', 'interface_number', 'chain_id', 'residue', 'sequence', 'insertion_code', 'buried_surface_area','solvent_accessible_area', 'hsdc'],\n",
    "    na_values=[' ']\n",
    "    ).rename(columns={\"pdb_code\":\"pdb_id\",\n",
    "                      \"sequence\":\"author_residue_number\",\n",
    "                      \"insertion_code\":\"author_insertion_code\",\n",
    "                      \"residue\":\"residue_name\",\n",
    "                      \"chain_id\": \"struct_asym_id_in_assembly\"})\n",
    "check.author_insertion_code.fillna('', inplace=True)\n",
    "\n",
    "chain_df_check = eec_as_df[eec_as_df.pdb_id.eq('1u7f') & eec_as_df.assembly_id.eq(1)]\n",
    "residues_check = pd.read_csv(\"C:\\\\Download\\\\20200716\\\\biounit\\\\pdb%entry%residue_listing%+1u7f.tsv\", sep=\"\\t\", converters=converters)\n",
    "check = check.merge(chain_df_check, how=\"left\")\n",
    "check = check.merge(residues_check, how=\"left\")\n",
    "def annotate_pisa(df: pd.DataFrame):\n",
    "    '''\n",
    "    Buried Residues:  ASA.eq(0)\n",
    "    Surface Residues: ASA.ne(0)\n",
    "    Interface Residues: BSA.ne(0)\n",
    "    '''\n",
    "    df['pisa_surface'] = df.solvent_accessible_area.apply(lambda x: 1 if x>0 else 0)\n",
    "    df['pisa_interface'] = df.buried_surface_area.apply(lambda x: 1 if x>0 else 0)\n",
    "    return df\n",
    "\n",
    "annotate_pisa(check)\n",
    "# check['pic'] = check.apply(lambda x: f\"P1_{x['residue_number']}_{x['residue_name']}\" if x['struct_asym_id'] == 'B' else f\"P2_{x['residue_number']}_{x['residue_name']}\", axis=1)\n",
    "check['pic'] = check.author_residue_number.astype(str)+' and ^'+check.author_insertion_code\n",
    "check_interface = check[check.pisa_interface.eq(1)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ab_i = tuple(zip(check_interface.residue_number, check_interface.struct_asym_id))\n",
    "# ac_i = tuple(zip(check_interface.residue_number, check_interface.struct_asym_id))\n",
    "# bc_i = tuple(zip(check_interface.residue_number, check_interface.struct_asym_id))\n",
    "info = {\"atom_site\": [dict(zip((\"label_seq_id\", \"label_asym_id\"), tp)) for tp in set(ab_i+ac_i+bc_i)]}\n",
    "# json.dumps(info)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "def str_int_join(iterable):\n",
    "    return ' or '.join(f\"({i})\" for i in iterable)\n",
    "# ab_dict = check[check.pisa_interface.eq(1)].groupby(['struct_asym_id_in_assembly']).pic.apply(str_int_join).to_dict()\n",
    "# bc_dict = check[check.pisa_interface.eq(1)].groupby(['struct_asym_id_in_assembly']).pic.apply(str_int_join).to_dict()\n",
    "# ac_dict = check[check.pisa_interface.eq(1)].groupby(['struct_asym_id_in_assembly']).pic.apply(str_int_join).to_dict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "ab_s_dict = check[check.pisa_surface.eq(1)].groupby(['struct_asym_id_in_assembly']).pic.apply(str_int_join).to_dict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ]
}